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Abstract 

Background: Earlier, we identified proteins connecting different disease proteins in the human protein-protein 
interaction network and quantified their mediator role. An analysis of the networks of these mediators shows that 
proteins connecting heart disease and diabetes largely overlap with the ones connecting heart disease and 
obesity. 

Results: We quantified their overlap, and based on the identified topological patterns, we inferred the structural 
disease-relatedness of several proteins. Literature data provide a functional look of them, well supporting our 
findings. For example, the inferred structurally important role of the PDZ domain-containing protein GIPC1 in 
diabetes is supported despite the lack of this information in the Online Mendelian Inheritance in Man database. 
Several key mediator proteins identified here clearly has pleiotropic effects, supported by ample evidence for their 
general but always of only secondary importance. 

Conclusions: We suggest that studying central nodes in mediator networks may contribute to better 
understanding and quantifying pleiotropy. Network analysis provides potentially useful tools here, as well as helps 
in improving databases. 



Background 

The systems perspective on complex biological systems 
emphasizes that individual genes act in genetic networks 
and individual proteins play their roles in protein-pro- 
tein interaction (PPI) networks [1]. There is increasing 
interest in these networks, as their analysis helps to 
understand the relationship between the components (i. 
e. genes, proteins) and how these are positioned in the 
whole system. Well-connected hubs seem to be of high 
functional importance [2,3]. Consequently, studies on 
diseases based on PPI networks had the starting point 
by analysing the centrality of disease proteins. Genes 
associated with a particular phenotype or function are 
not randomly positioned in the PPI network, but tend 
to exhibit high connectivity; they may cluster together 
and can occur in central network locations [4,5]. 

Beyond focusing on the number of neighbours of 
graph nodes (their degree), wider neighbourhoods, indir- 
ect effects and larger subsets of nodes can also be 
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analyzed by the rich arsenal of network analytical tools. 
This non-local information may help, for example, to 
quantify the structural relationships between different 
sets of proteins. In an earlier paper [6], we have deter- 
mined proteins that mediate indirect effects between 
sets of proteins causing five diseases in the human PPI 
network. Their mediator role was quantified and they 
were ranked according to structural importance. Their 
functional role may be of high interest, as proteins 
involved in certain pairs of diseases have no direct inter- 
actions among them [6]. These findings motivated an 
appealing problem: „which proteins connect diseases in 
the human PPI network?". 

To be connected to diverse regions of the PPI network 
may lend a functionally pleiotropic character to a pro- 
tein in a classical, genetic sense: it has been demon- 
strated that high connectivity correlates well with 
pleiotropic effects [7,8]. The most central mediators are 
especially important in connecting apparently distant 
nodes in the human PPI network. Specific network posi- 
tions may render strange but characteristic behaviour 
(expression pattern) to different proteins [9,10]. Instead 
of being exceptional, these epistatic effects may be of 
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primary importance in physiology [11] and in better 
understanding animal development and adaptation. 

In this paper, (1) we compare two interaction net- 
works of mediators (mediating indirect effects between 
heart disease and obesity, and between heart disease and 
diabetes), (2) we analyse the structure of these two net- 
works and their aggregated total network, (3) we study 
the overlap between the two mediator networks, and (4) 
we infer biological functions for some proteins and pro- 
vide supporting literature data. All in all, we illustrate 
that network analysis is an excellent tool for identifying 
pleiotropy and epistasis from complex networks 
extracted from multiple databases. 

Results 

Network analysis 

We obtained 9 proteins involved in heart diseases (H), 
as well as 44 and 20 involved in diabetes (D) and obesity 
(O), respectively. The HD network contains N = 2142 
nodes and L = 3537 links, while the HO network con- 
tains N = 1746 nodes and L = 2567 links and the total 
network contains N = 2221 nodes and L = 3686 links. 
Figure 1 provides a schematic illustration for how the 
networks had been constructed (see Methods). Figure 2 
shows the relationships between mediator proteins in 
the HD (Figure 2a) and the HO (Figure 2b) networks. 
The HD network (Figure 3a) contains 25 HD mediators 
and their 2117 neighbours and the HO network (Figure 
3b) contains 12 HO mediators and their 1734 neigh- 
bours. In the „ total" network (Figure 4), 9 shared media- 
tors appear, so it contains only 28 mediator proteins. In 
this total network, 1667 nodes are present in both the 
HD and the HO network, 475 only in the HD and 79 
only in the HO network. 

The distributions of individual structural indices are 
very similar for all of the three analyzed networks. Addi- 
tional file 1 shows all values of the six network indices 
for all nodes in the three networks. Figure 5 shows 
these distributions only for the total network. We can 
observe that almost all indices follow a strongly left- 
skewed distribution where only a few nodes are extre- 
mely important. While degree (D), topological impor- 
tance (77) and betweenness centrality (BC) have really 
only one or a few hubs, topological overlap (TO) indi- 
cates several key nodes. Closeness centrality (CC) has a 
unimodal, normal-like distribution. 
For each network, there seem to be strong and positive 
rank correlation between all centrality indices but not 
for the overlap indices {TO 3 0 01 and TO 3 0 005 ). TO 
indices correlate positively and weakly with other cen- 
trality indices whereas they correlate negatively and 
weakly with CC (see Table 1). D best correlates with 
TI 3 . The TO measure offers different, complementary 
information than the centrality indices. 



Table 2 summarizes the results of the randomization 
test (note that only the means are shown in the table, 
for simplicity). The observed rank correlation coeffi- 
cients are all significantly lower than those for the ran- 
dom networks (with 95% confidence interval). This 
suggests that there are stronger rank correlations 
between different centrality indices in the random net- 
works, in comparison to the results obtained from the 
HD, HO and total networks. One possible explanation 
for this discrepancy is that, beyond the mathematical 
properties, real networks are structured also by biologi- 
cal constraints. Thus, different centrality indices can 
capture different aspects of network topology, therefore 
correlation between different indices are weaker for real 
networks. This provides more support on using various 
network indices to capture different topological proper- 
ties embedded in real networks. 

Biological results 

We now examine more closely the rank order of the top 
nodes in each network. The degree ranks for the three 
networks are almost identical (see Tables 3, 4 and 5). 
The most central nodes are P62993 (Growth factor 
receptor-bound protein 2), P63104 (14-3-3 protein zeta/ 
delta) and P06241 (Tyrosine-protein kinase Fyn). The 9 
shared proteins rank in the same order in HD and HO 
and there is no change in rank order also in the total 
network. In the HO network, the 12 mediators lead the 
ranking, and then come their neighbours. However, in 
the HD rank (and also in the total network), there is 
one non-mediator protein in the top 26 of the rank 
(among the 25 HD mediators); this is P00533 (Epider- 
mal growth factor receptor) in the 23 rd position. 

The betweenness ranks correspond quite well to the 
degree ranks with some exceptions. For example in the 
HD network, P06241 (Tyrosine-protein kinase Fyn) is 
three positions lower in betweenness ranking when com- 
pared to its degree rank position. In the HD network, 
instead of one, now five non-mediators are mixed with 
HD mediators in the top of the list, while some HD med- 
iators such as Q99616 (C-C motif chemokine 13) lose 
their high degree-based rank completely. In contrast, the 
degree rank order seems to be consistent with its 
betweennes counterpart for HO and total networks. 

Despite the large overlap between the HD and HO 
networks, the rank positions of HD and HO mediator 
proteins are quite different in the two networks. For 
example, both P17302 (Gap junction alpha-1 protein) 
and P43405 (Tyrosine-protein kinase SYK) rank high in 
the HD network but not in the HO network. As it is 
shown on Figure 2b, 014908 (PDZ domain-containing 
protein GIPC1) is the only protein among the three 
exclusive HO mediators that is part of the interaction 
network of HO mediators. 
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Figure 1 The HD and HO mediator networks and their subnetworks. Red, blue and yellow proteins are involved in three diseases (H: heart 
diseases, D: diabetes, 0: obesity). Pink proteins mediate indirect effects between the red and the blue ones, while orange proteins mediate 
between the red and the yellow ones. Black proteins mediate between both pairs. White proteins are the non-mediator neighbours of the 
mediator proteins. We analyzed five networks: the HD mediator network (pink and black nodes with their white neighbours), the HO mediator 
network (orange and black nodes with their white neighbours), the total mediator network (pink, orange and black nodes with their white 
neighbours), the subnetwork of interactions among HD mediators (pink and black nodes) and the subnetwork of interactions among HO 
mediators (orange and black nodes). 



Additional File 2 shows the extracted GO terms of 
proteins ranked by different structural indices for the 
HD network, the HO network and the total network. 
For example, by considering the top 30 proteins ranked 



by degree in the HD network, we found that half of 
them are related to the processes 'intracellular signaling 
cascade' (GO:0007242) and 'protein amino acid phos- 
phorylation' (GO:0006468), meanwhile in the HO 
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Figure 2 Subgraphs of the HD (a) and HO (b) networks, showing the interactions only between HD and HO mediators, respectively. 



network 16 of them are located in 'plasma membrane' 
(GO:0005886) and 13 of them are related to process 
'cell surface receptor linked signal transduction' 
(GO:0007166). 

The p-values of proteins quantify their average fit to 
the studied GO-terms (i.e. to what extent they can be 



characterized by certain functionality). By comparing 
those p-values to centrality and overlap indices used in 
this study, we can conclude that the performance of dif- 
ferent indices vary strongly. In the total network, only 
the T0 3 0 .oi index correlates significantly with biological 
function (Table 6). Note that the performance of T0 3 t 
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depends on the t threshold used. Proteins in unique 
positions are, thus, typically involved in the above-men- 
tioned key functions. The other relatively well-perform- 
ing index is CC, whereas D and BC correlate with 
function only once each. TI 3 and TO 3 0 005 do not corre- 
late with functions defined by GO terms. Furthermore, 
functional roles are best predictable by these structural 
indices in the HD network and less so for the HO 
network. 

Discussion 

Based on its centrality ranks, P63104 (14-3-3 protein 
zeta/delta) corresponding to gene YWHAZ seems to be 
the second most important protein in these mediator 
processes. This is in concert with the literature, stating 
that P63104 is a chaperon [12] and is richly connected 
to several kinds of other molecules with mostly weak 
links [13]. Specifically, it is involved in cell growth and 
carcinogenesis [14], breast cancer reoccurrence after 
chemotherapy resistance [15], luteal sensitivity to PGF 
[16] and, finally, it is part of antiapoptotic (P13K/AKT) 
and cell proliferation (ERK/MAPK) pathways [17]. Its 
connecting position has been demonstrated by network 
analysis, showing its involvement in several HSNs (high- 
scoring subnetworks [18]). Ogihara et al. [19] suggested 
that the association with 14-3-3 protein may play a role 
in the regulation of insulin sensitivity by interrupting 
the association between the insulin receptor and IRS1. It 
means that P63104 probably mediate HD and HO 
through the regulation process of insulin (as insulin is a 
crucial hormone in human metabolic system). Typically 



it is not directly responsible for diseases (not assigned to 
any disease in the OMIM database) but very frequently 
mentioned as a candidate protein in the background, 
requiring further investigation [14]. 

The most important protein, P62993 (Growth factor 
receptor-bound protein 2) corresponding to gene GRB2 
leads in all of the six structural importance ranks. It 
appears in the mammalian Grb2-Ras signaling pathway 
with SH2/SH3 domain interactions and several func- 
tions in embryogenesis and cancer [20]. Zhang et al. 
[21] also found that GRB2 is essential for cardiac hyper- 
trophy and fibrosis in response to pressure overload and 
that different signaling pathways downstream of GRB2 
regulate fibrosis, fetal gene induction, and cardiomyo- 
cyte growth. Yet, in the subgraph of the HO mediators, 
P62993 does not seem to occupy a central position but 
its phenotypic traits are likely to be affected through the 
links to non-mediators instead of other HO mediators. 
This kind of structural arrangement is advantageous for 
information integration, while a strongly connected 
mediator subnetwork implies functional redundancy. 

Among the three exclusive HO (non-HD) mediators, 
014908 (PDZ domain-containing protein GIPC1) corre- 
sponding to the gene GIPC1 appears in the HO mediator 
subgraph, while the other two are isolated (Q14232 - 
Translation initiation factor eIF-2B subunit alpha corre- 
sponding to gene EIF2B1; Q5JY77 - G-protein coupled 
receptor-associated sorting protein 1 corresponding to 
gene GPRASP1). This may suggest also that 014908 is an 
HD mediator. Its connection to heart disease is clear but 
its interaction with diabetes-related proteins is not 
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documented in the OMIM databases (also not for the 
other two proteins). However, this inferred function is well 
supported by Klammt et al. [22] reporting on the role of 
014908 in diabetes. A possible outcome of network analy- 
sis is to suggest potential updates in the databases. 

The only protein that ranks higher than HD mediator 
proteins in the degree-based centrality rank of the HD 
network is P00533 (Epidermal growth factor receptor), 
corresponding to the gene EGFR. We could speculate 
that this protein might also mediate between H and D 
proteins. In the total PPI network, it is linked to two D 
proteins (Q9UQF2 - JNK-interacting protein 1; 



Q9UQQ2 - Signal transduction protein Lnk) but not to 
H protein. EGFR and its ligands are cell signaling mole- 
cules involved in a wide range of cellular functions, 
including cell proliferation, differentiation, motility, and 
tissue development [23]. Research on EGFR's pathogen- 
esis have been focused on lung cancer [24] and have not 
discovered its link to heart diseases. However, Iwamoto 
and his colleagues observed the role of ErbB signaling in 
heart functions [25]. Also, it has been shown to be a 
central protein according to other sophisticated network 
analysis techniques [26], dominating the clique composi- 
tion of certain pathways. 
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Figure 5 The distributions of nodal index values in the total network. 



Based on our static, structural inference, it is not easy 
to decide whether a protein is „strongly linked to a dis- 
ease" or it is a „disease protein". The definitions are very 
poor here. Is P00533 a H protein (causing heart dis- 
eases) or HD protein (mediating between H and D pro- 
teins)? The solution is to use inference for generating 
new hypotheses, improving databases and designing 
experiments, instead of regarding the inferred findings 
as results. 

Conclusions 

Our study focused on only a few diseases but the 
approach and the methods used can be generalized. It 
may be interesting to extend this research to other dis- 
eases and to study the pleiotropic effects of mediators 
linking other disease pairs. The mediator proteins ana- 
lyzed in this study typically have pleiotropic effects. 
They connect several pathways and influence several 



phenotypic traits. The reason why their inferred struc- 
tural roles miss from the OMIM database is exactly that 
they act in a non-Mendelian way. They are typically not 
the singular elements of important pathways but weak 
connectors among several pathways of high importance. 
This way, their effects can be fundamental. Their under- 
standing needs a multi-locus, systems-based, network 
view. As individual pathways are linked to networks, our 
non-Mendelian knowledge on linkage, epistasis and 
pleiotropy becomes larger. If network analysis makes 
these epistatic and pleiotropic effects quantifiable and 
predictable, we are getting closer to better understand 
delegated complexity [27]. From an application perspec- 
tive, it would be interesting to see whether a healthy 
(intact and well-connected) network of mediators could 
contribute to healthy phenotypes or, in contrary, discon- 
necting the mediator network could be used to isolate 
diseases and reduce side-effects of drugs. 
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Table 1 Correlations between indices of the real 



networks. 
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The Spearman rank correlation coefficients between each pair of centrality 
indices for the HD and HO networks as well as the total network. 



Methods 

Data 

We have analyzed human protein-protein interaction 
network (PPI) data extracted from the I2D database. 
I2D (Interologous Interaction Database) is an on-line 
database of known and predicted mammalian and 
eukaryotic protein-protein interactions [28]. It is one of 



Table 2 Correlations between indices of the randomized 
networks. 
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The mean Spearman rank correlation coefficients between each pair of 
centrality indices obtained from 1000 random networks of the same size as 
the HD, HD and the total network. 



the most comprehensive sources of known and pre- 
dicted eukaryotic PPIs. 

We carefully considered the completeness of the PPI 
network by investigating various human PPI databases. 
In their database, the Authors have collected data from 
almost all of the well-known human protein interaction 
databases including HRPD http://www.hprd.org/, BIND 
http://bind.ca/, MINT http://mint.bio.uniroma2.it/mint/ 
and Intact http://www.ebi.ac.uk/intact/, among others. 
Those databases are built by arrange of methods, some 
are experimental ones, some are predicted ones, and 
some are curated from the literature. By using the I2D 
database, we could thus construct the network integrated 
from multiple data sources. We investigated other data- 
bases not included in the I2D database, particularly the 
STRING database http://string.embl.de/ and we found 
that almost all high-scoring interactions in STRING were 
covered in our data set. Combining data from various 
sources is supposed to be more comprehensive for ana- 
lyzing the PPI network than studying each data source 
separately. To obtain a more reliable set of protein inter- 
actions, we excluded all the interactions obtained by 
homology methods: only experimentally verified ones 
were included in our analysis. For the disease phenotypes, 
the clinical Online Mendelian Inheritance in Man data- 
base (OMIM, [29]) was investigated. We have checked 
whether we need to update our database used in Nguyen 
and Jordan [6] and found that we can use the same data 
set as the number of updates is negligible. 

Analysis 

From the human PPI network data, we constructed: (1) 
a network of proteins mediating indirect effect between 
heart disease (H) and diabetes (D) proteins (i.e. HD 
mediators) and their direct neighbours (i.e. HD net- 
work); (2) a network of proteins mediating indirect 
effect between heart disease (H) and obesity (O) pro- 
teins (i.e. HO mediators) and their direct neighbours (i. 
e. HO network); and (3) an aggregated network of the 
two previous networks (i.e. total network). We consid- 
ered only two-step mediator proteins, directly connected 
to two proteins related to different diseases and being 
otherwise unconnected (so, we do not consider chains 
of mediators). We have also studied the subnetworks of 
(1) and (2) without non-mediator neighbours. See Figure 
1 for schematically illustrating the relationships between 
these five networks. Figure 2 shows the subnetworks 
without non-mediator neighbours (Figure 2a for HD 
and Figure 2b for HO). Figure 3a shows the HD and 
Figure 3b shows the HO network. The total network is 
shown in Figure 4. 

Earlier we have determined the identity of these HD 
and HO mediators and quantified the strength of their 
mediator effect [6]. Here, we focus on the networks of 
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Table 3 Centrality ranks for the HD network. 
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The rank of the most central 30 nodes in the HD network, based on the six importance indices analyzed. 



mediators. Links in these networks are undirected (if 
protein i is linked to protein then / is also linked to i) 
and unweighted (we have no data for the intensity or 
strength of the interactions). We have characterized 
each network by some simple network statistics. 

(i) The simplest index that provides the most local 
information about node i is its degree CD,). This is the 
number of other nodes connected directly to node L We 
have calculated the normalized degree: 

nDi = (1) 
N- 1 

where N is the number of nodes in the network. 

(ii) A measure of positional importance quantifies how 
frequently a node i is on the shortest path between 
every pair of nodes / and k. This index is called 
"betweenness centrality" CBQ) and it is used routinely in 
network analysis [30]. The normalized betweenness cen- 
trality index for a node i [nBC-) is: 



2 x £&fc(0/&fc 
nBC . t± P) 

where i * j and k; is the number of equally shortest 
paths between nodes / and /c, and gj k (i) is the number 
of these shortest paths to which node i is incident (gj k 
may equal one). The denominator is twice the number 
of pairs of nodes without node /. This index thus mea- 
sures how central a node is in the sense of being inci- 
dent to many shortest paths in the network. 

(iii) "Closeness centrality" (CQ) is a measure quantifying 
how short are the minimal paths from a given node i to all 
others [30]. The normalized index for a node i (nCQ) is: 

N — 1 
nCQ = , 

* (3) 
Zj^ij 
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Table 4 Centrality ranks in the HO network. 
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The rank of the most central 30 nodes in the HO network, based on the six importance indices analyzed. 



where zV/, and d t j is the length of the shortest path 
between nodes i and / in the network. This index thus 
measures how close a node is to others. The larger nCQ 
is for node z, the more directly its deletion will affect the 
majority of other nodes. 

(iv) Topological importance can also be quantified 
by general matrix algebra. In an undirected network, 
we define a n>i j as the effect of / on i when i can be 
reached from ; in n steps. The simplest way of calcu- 
lating a H) ij is when n = 1 (i.e. the effect of ; on i in 1 
step): 

*Uj = ( 4 ) 

where D t is the degree of node i {i.e. the number of 
its direct neighbours). We assume that indirect effects 
are multiplicative and additive. For instance, we wish 
to determine the effect of / on i in 2 steps, and there 
are two such 2-step pathways from ; to i: one is 



through k and the other is through h. The effects of / 
on i through k is defined as the product of two direct 
effects (i.e. ai,k;Xai,ik)> therefore the term multiplica- 
tive. Similarly, the effect of ; on i through h equals to 
0i,hj,i x &i,ih' To determine the 2-step effect of ; on i 
( a 2,ij)> we simply sum up those two individual 2-step 
effects: 

02 # ij = ai,kj • a\,ik + ai,hj • &i,ih, (5) 

and therefore the term additive. When the effect of 
step n is considered, we define the effect received by 
node i from all other nodes in the same network as: 

N 

which is equal to 1 (i.e. each node is affected by the 
same unit effect.). Furthermore, we define the n-step 
effect originated from node i as: 



Nguyen et al. BMC Systems Biology 201 1, 5:1 79 
http://www.biomedcentral.eom/1 752-0509/5/1 79 



Page 11 of 13 



Table 5 Centrality ranks in the total network. 
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The rank of the most central 30 nodes in the total network, based on the six importance indices analyzed. 



Table 6 Correlations between p-values and centrality. 





nD 


nCC 


nBC 


Tl 3 


T0 3 0 .oi 


T0 3 0 .oo5 


HD/D 


0.2568 


0.3077 


0.2635 


0.2577 


0.3365 


0.2321 


HD^I 


0.2467 


0.2216 


0.2062 


0.1677 


0.4971 


0.3017 


HD^O 


0.3957 


0.432 


0.3993 


0.3775 


0.3803 


0.3386 




nD 


nCC 


nBC 


Tl 3 


T0 3 0 .oi 


T0 3 0 .oo5 


HO/D 


0.1501 


0.144 


0.1767 


0.1925 


0.1766 


0.1487 


HGvTI 


0.0966 


0.0007 


0.0919 


0.1127 


0.1467 


0.1254 


HCj/TO 


0.3064 


0.4051 


0.3396 


0.3211 


0.3446 


0.251 




nD 


nCC 


nBC 


Tl 3 


T0 3 0 .oi 


T0 3 0 .oo5 


total/D 


0.2687 


0.2605 


0.2136 


0.1853 


0.4245 


0.3045 


total/TI 


0.2687 


0.2605 


0.2136 


0.1853 


0.4245 


0.3045 


total/TO 


0.3734 


0.3692 


0.3599 


0.3372 


0.4258 


0.3092 



The Spearman rank correlation coefficients between the p-values of GO terms calculated for the most central nodes according to particular indices in particular 
networks and the node centrality values of the nodes. Bold numbers mean p < 0.05. 
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(7) 



(9) 



which may vary among different nodes (i.e. effects ori- 
ginated from different nodes may be different). Here, we 
define the topological importance of node i when effects 
"up to" n step are considered as: 



77? = 



n w N 

J2 a m,i ^ 
m=l m=lj=l 



(8) 



which is simply the sum of effects originated from 
node i up to n steps (one plus two plus three. ..up to n) 
averaged over by the maximum number of steps consid- 
ered {i.e. n). This TI n index measures the positional 
importance of a node by considering how effects origi- 
nated from such a given node can spread through the 
whole network to reach all nodes after a pre-defined n 
step length [31]. Calculations were performed by the 
CosBiLAB Graph software [32]. 

(v) Basically every node in a network is connected to 
each other, but it still matters how strongly they are 
connected (whether two nodes are neighbors in the net- 
work, second neighbors or more distant ones). Thus, it 
is of interest to study the indirect neighborhood of par- 
ticular nodes, considering more than only the neighbors 
but less than the whole network. For a given step length 
n and a given network, there is an interaction matrix 
presenting the relative strengths of interactions between 
each pair of nodes i and We note that interaction 
strength is used here in a totally structural sense, with 
no dynamical component. If n exceeds 2 or 3, and the 
network is not very large, then there is non-zero inter- 
action strength between each pair of nodes (everything 
is connected to everything else). Thus, an effect thresh- 
old (t) can be set, determining the "effective range" of 
the interaction structure of a given graph node i, and 
nodes within this effective range are defined as strong 
interactors of i (i.e. effects received from i being greater 
than t) whereas nodes outside this range are defined as 
is weak interactors (effects received from i is less than 
t). Since the sets of strong interactors of two or more 
nodes may overlap, it is possible to quantify this overlap 
(the number of shared strong interactors) in order to 
measure the positional uniqueness of individual graph 
nodes. The topological overlap between nodes i and / up 
to n steps {TO n t> tj ) is the number of strong interactors 
appearing in both is and fs effective ranges determined 
by the threshold t. The sum of all TO-values between 
node i and others provides the summed topological 
overlap of node i: 



For simplicity of representation, we drop the subscript 
i for all indices. A more detailed description of this 
index can be found in [33]. Calculations were performed 
by the CosBiLAB Graph software [32]. Two thresholds 
have been used, t ± = 0.01 and t 2 = 0.005. 

Each of the six above mentioned structural indices 
were determined for every node in the networks. The 30 
most central ones are presented for the HD network 
(Table 3), the HO network (Table 4) and the total net- 
work (Table 5). Additional File 2 presents all index 
values for all nodes in these networks. 

Since different network indices provide different rank- 
ings, it is a question of how similar these rankings are. 
Similarity refers to robust importance ranks (irrespective 
to the index), while dissimilarity refers to the comple- 
mentary information content of the different indices. 
For statistical analysis, we calculated the Spearman rank 
correlation coefficient for each pair of the indices in the 
three major networks (Table 1). 

In order to better understand the ranking of nodal 
indices, we determined the distribution of each struc- 
tural index for each network. We present these distri- 
butions for the total network in Figure 5. To test the 
significance of the observed rank correlation coeffi- 
cients, we have constructed random networks. For 
each of our observed networks (i.e. HD, HO, total), we 
calculated the probability of two nodes being linked 
together: 



(10) 



(N 2 -N)/2" 



We have constructed 1000 random networks with 
fixed N and a p link probability. For each random net- 
work, we calculated the same centrality indices and 
determined the Spearman rank correlation coefficient 
for each pair of centrality indices. Since we have 1000 
random networks, for each pair of centrality indices we 
thus have 1000 Spearman rank correlation coefficients. 
From their distribution, we determined the mean and 
the 95% confidence intervals. Results are summarized in 
Table 2. 

For the top 30 nodes ranked by a particular index in a 
particular network, we quantified their biological func- 
tion by calculating the p-values of GO terms [34]. Speci- 
fically, we determined the ratio of the top 30 nodes that 
can be characterized by a certain GO term and com- 
puted the associated p-values (Table 6). Bold numbers 
mean p < 0.05. 



Nguyen et al. BMC Systems Biology 201 1, 5:1 79 
http://www.biomedcentral.eom/1 752-0509/5/1 79 



Page 13 of 13 



Additional material 



Additional file 1: Network indices for three networks. The values of 
the six network indices are given here for all nodes in the three 
networks. 

Additional file 2: The GO terms and p-values studied in this paper. 

The extracted GO terms and their statistics of proteins ranked by 
different structural indices for the HD network, the HO network and the 
total network. 
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